library(terra) |> suppressPackageStartupMessages()
library(mapview)
library(here) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()Vectors
At the end of this tutorial, you will be know :
- what is a spatial vector data type
- how to create and modify one
- how to read and write the most common formats
Introduction
Typically, there are three types of vectors: addrefslide
- Points
- Lines
- Polygons
The most common formats are: addrefslide
- ESRI shapefile (.shp) : the traditional and most frequently used format
- Geopackages (.gpkg) : the new standard
- GPS Exchange Format (.gpx)
- Google Earth file (.kml / .kmz)
There are two main packages in R to work with spatial vectors: terra and sf (that replace older packages such as raster and sp). This tutorial will mostly use the package terra for smoother integration with the rest of the tutorial (especially working with rasters), but most functions have their equivalent in sf.
At the start of the tutorial, equivalent functions in sf will be shown.
Let’s start with loading the required packages.
Points
Creating spatial points from data
This is the most common case: you have coordinates from a (csv/excel/text) file and you want to get spatial information about these locations. The first step is to transform the coordinates as proper spatial object in R.
We will use a dataset from GBIF(addref) with all occurrences of otter recorded in 2021 within a 50km buffer from Montpellier, France. This data was prepared and stored on Github.
otter <- read.csv(here("data", "gbif_otter_2021_mpl50km.csv"))otter <- read.csv(
"https://github.com/FRBCesab/spatial-r/raw/main/data/gbif_otter_2021_mpl50km.csv"
)dim(otter)[1] 83 15
names(otter) [1] "key" "institutionCode"
[3] "species" "occurrenceStatus"
[5] "eventDate" "year"
[7] "month" "day"
[9] "decimalLongitude" "decimalLatitude"
[11] "elevation" "identificationVerificationStatus"
[13] "identifier" "datasetKey"
[15] "provider"
table(otter$institutionCode)
iNaturalist UAR PatriNat
7 76
The dataset contains 83 observations of otter (in rows), and 15 variables (in column) such as date, coordinates, and data provider. The geographic coordinates are stored in the columns decimalLongitude, and decimalLatitude. They are expressed in decimal degrees, with the standard datum WGS84 EPSG:4326.
Let’s create a spatial object from the coordinates.
In terra, the key function is terra::vect().
pt_terra <- vect(
otter,
geom = c("decimalLongitude", "decimalLatitude"),
crs = "EPSG:4326"
)
# summary
pt_terra class : SpatVector
geometry : points
dimensions : 83, 13 (geometries, attributes)
extent : 3.28983, 4.46447, 43.30373, 44.06548 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
names : key institutionCode species occurrenceStatus
type : <num> <chr> <chr> <chr>
values : 3.059e+09 iNaturalist Lutra lutra PRESENT
3.854e+09 UAR PatriNat Lutra lutra PRESENT
3.854e+09 UAR PatriNat Lutra lutra PRESENT
eventDate year month day elevation identificationVerificationStatus
<chr> <int> <int> <int> <int> <chr>
2021-01-24T15:~ 2021 1 24 NA NA
2021-01-13 2021 1 13 NA Probable
2021-01-08 2021 1 8 NA Probable
(and 3 more)
# visualization
plot(pt_terra)
In sf, the key function is sf::st_as_sf().
pt_sf <- st_as_sf(
otter,
coords = c("decimalLongitude", "decimalLatitude"),
crs = "EPSG:4326"
)
# summary
pt_sfSimple feature collection with 83 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3.28983 ymin: 43.30373 xmax: 4.46447 ymax: 44.06548
Geodetic CRS: WGS 84
First 10 features:
key institutionCode species occurrenceStatus eventDate
1 3058935748 iNaturalist Lutra lutra PRESENT 2021-01-24T15:26:55
2 3853886594 UAR PatriNat Lutra lutra PRESENT 2021-01-13
3 3853886619 UAR PatriNat Lutra lutra PRESENT 2021-01-08
4 3853886780 UAR PatriNat Lutra lutra PRESENT 2021-01-12
5 4546764953 UAR PatriNat Lutra lutra PRESENT 2021-01-31
6 4546968119 UAR PatriNat Lutra lutra PRESENT 2021-01-17
7 4547054813 UAR PatriNat Lutra lutra PRESENT 2021-01-31
8 4548321676 UAR PatriNat Lutra lutra PRESENT 2021-01-23
9 4548484387 UAR PatriNat Lutra lutra PRESENT 2021-01-31
10 4552966259 UAR PatriNat Lutra lutra PRESENT 2021-01-06
year month day elevation identificationVerificationStatus
1 2021 1 24 NA <NA>
2 2021 1 13 NA Probable
3 2021 1 8 NA Probable
4 2021 1 12 NA Probable
5 2021 1 31 NA Probable
6 2021 1 17 NA Probable
7 2021 1 31 NA Probable
8 2021 1 23 NA Probable
9 2021 1 31 NA Probable
10 2021 1 6 NA Probable
identifier datasetKey
1 68608477 50c9509d-22c7-4a22-a47d-8c48425ef4a7
2 5b4c3803-50a9-465a-a1a7-35c4a4d7d508 c32f3129-a4dc-4e36-86d4-a35cc5cfb04d
3 cc533347-f1e3-4cf3-9f55-2c7cf60500de c32f3129-a4dc-4e36-86d4-a35cc5cfb04d
4 3210c6f2-391a-4cc4-b497-b915bbc9d7d4 c32f3129-a4dc-4e36-86d4-a35cc5cfb04d
5 631348e0-646f-41eb-b3c7-01900054c689 256b9877-cef3-4e8e-84e1-f23299c49655
6 754868ad-58ef-41eb-b3c7-01900054452d 256b9877-cef3-4e8e-84e1-f23299c49655
7 771217f4-6470-41eb-b3c7-01900054c695 256b9877-cef3-4e8e-84e1-f23299c49655
8 b00a76e4-907d-41eb-adf1-041006068357 256b9877-cef3-4e8e-84e1-f23299c49655
9 b2343db5-646f-41eb-b3c7-01900054c693 256b9877-cef3-4e8e-84e1-f23299c49655
10 33310104-5041-41eb-bfb5-01900053f23b 256b9877-cef3-4e8e-84e1-f23299c49655
provider
1 iNaturalist Research-grade Observations
2 LIGNE NOUVELLE MONTPELLIER PERPIGNAN DUP PHASE 1 - Inventaire non protocolé Mammifères
3 LIGNE NOUVELLE MONTPELLIER PERPIGNAN DUP PHASE 1 - Inventaire non protocolé Mammifères
4 LIGNE NOUVELLE MONTPELLIER PERPIGNAN DUP PHASE 1 - Inventaire non protocolé Mammifères
5 UAR PatriNat - opportunistic data Faune-France
6 UAR PatriNat - opportunistic data Faune-France
7 UAR PatriNat - opportunistic data Faune-France
8 UAR PatriNat - opportunistic data Faune-France
9 UAR PatriNat - opportunistic data Faune-France
10 UAR PatriNat - opportunistic data Faune-France
geometry
1 POINT (3.792841 43.81636)
2 POINT (3.46399 43.38562)
3 POINT (3.28983 43.3172)
4 POINT (3.83235 43.57202)
5 POINT (3.46399 43.38562)
6 POINT (3.92099 44.06548)
7 POINT (3.85823 43.52581)
8 POINT (3.57398 43.63949)
9 POINT (3.28983 43.3172)
10 POINT (3.54517 43.75695)
# visualization
plot(pt_sf, max.plot = 1, axes = TRUE)
Handling spatial vectors
Let’s have a look at the most common functions to handle spatial vectors in R.
# get the geographical extent
ext(pt_terra)SpatExtent : 3.28983, 4.46447, 43.30373, 44.06548 (xmin, xmax, ymin, ymax)
# get number of objects, number of attributes
dim(pt_terra)[1] 83 13
# get the projection system
crs(pt_terra, describe = TRUE) name authority code area extent
1 WGS 84 EPSG 4326 World -180, 180, -90, 90
# get the coordinates
crds(pt_terra) |> head() x y
[1,] 3.792841 43.81636
[2,] 3.463990 43.38562
[3,] 3.289830 43.31720
[4,] 3.832350 43.57202
[5,] 3.463990 43.38562
[6,] 3.920990 44.06548
# manipulate the dataset
# the attached data is formatted as a standard data.frame
# you can add a new column with '$' (for instance month labels)
pt_terra$lab_month <- month.name[pt_terra$month]
# filter rows with '[]' (for instance, keep only iNaturalist observations)
iNat_terra <- pt_terra[pt_terra$institutionCode == "iNaturalist", ]
dim(iNat_terra)[1] 7 14
# get the geographical extent
st_bbox(pt_sf) xmin ymin xmax ymax
3.28983 43.30373 4.46447 44.06548
# get number of objects, number of attributes
dim(pt_sf)[1] 83 14
# get the projection system
crs(pt_sf, describe = TRUE) name authority code area extent
1 WGS 84 EPSG 4326 World -180, 180, -90, 90
# get the coordinates
st_coordinates(pt_sf) |> head() X Y
[1,] 3.792841 43.81636
[2,] 3.463990 43.38562
[3,] 3.289830 43.31720
[4,] 3.832350 43.57202
[5,] 3.463990 43.38562
[6,] 3.920990 44.06548
# manipulate the dataset
# the attached data is formatted as a standard data.frame
# you can add a new column with '$' (for instance month labels)
pt_sf$lab_month <- month.name[pt_sf$month]
# filter rows with '[]' (for instance, keep only iNaturalist observations)
iNat_sf <- pt_sf[pt_sf$institutionCode == "iNaturalist", ]
dim(iNat_sf)[1] 7 15
Projection
There are different coordinate reference systems in which coordinates can be expressed (e.g. different unit). While manipulating multiple datasets from different sources, it is important to make sure they are all in the same coordinate system. If it is not the case, then you will need make projection to convert the projection system. addrefslide
For example, in France, the preferred projection system by IGN is Lambert 93 (EPSG:2154).As an exercise, let’s project our points to EPSG:2154.
In terra, the key function is terra::project().
# projection
pt_2154_terra <- project(pt_terra, "EPSG:2154")
# see the differences in the bounding box
# extent in lat/long
ext(pt_terra)SpatExtent : 3.28983, 4.46447, 43.30373, 44.06548 (xmin, xmax, ymin, ymax)
# new extent in Lambert-93
ext(pt_2154_terra)SpatExtent : 723524.638618867, 818474.392571545, 6245000.52485334, 6330038.9258292 (xmin, xmax, ymin, ymax)
In sf, the key function is sf::st_transform().
#projection
pt_2154_sf <- st_transform(pt_sf, crs = "EPSG:2154")
# see the differences in the bounding box
# extent in lat/long
st_bbox(pt_sf) xmin ymin xmax ymax
3.28983 43.30373 4.46447 44.06548
# new extent in Lambert-93
st_bbox(pt_2154_sf) xmin ymin xmax ymax
723524.6 6245000.5 818474.4 6330038.9
Export the data
For vectors, it is recommended to export them as geopackage file (extension .gpkg). Compare to traditional ESRI shapefile, the geopackage format store the data in a single file and the column names are preserved. (addref slide file format).
In terra, the function is terra::writeVector().
writeVector(pt_terra, here("data", "gbif_otter_2021_mpl50km.gpkg"))In sf, the function is sf::write_sf() or sf::st_write. (why two options?)
write_sf(pt_sf, here("data", "gbif_otter_2021_mpl50km.gpkg"))Conversion between sf and terra
It is confusing to have two dominant packages terra and sf having similar functionalities. Luckily, the conversion between these two format is easy.
To convert a terra vector into sf, the function is sf::st_as_sf().
pt_sf <- st_as_sf(pt_terra)To convert a sf feature into terra, the function is terra::vect(). Be careful that non-homogeneous features (mix of points, lines and polygons) can’t be converted as a terra SpatVector object. Only homogeneous points, line or polygons can be converted into terra geometry.
pt_terra <- vect(pt_sf)Lines
Lines are another type of vectors, made of multiple points. It is possible to create lines but in practice, it often comes from an existing spatial dataset. In our example case study, we will load the rivers for the area of interest from BD CARTO (addref).
Load lines from a geopackage file
river <- vect(here("data", "BDCARTO-River_mpl50km.gpkg"))river <- vect(
"https://github.com/FRBCesab/spatial-r/raw/refs/heads/main/data/BDCARTO-River_mpl50km.gpkg"
)Handling spatial vectors
# get the number of objects, and the number of attributes
dim(river)[1] 1500 8
# get the projection system
crs(river, describe = TRUE) name authority code area extent
1 WGS 84 EPSG 4326 World -180, 180, -90, 90
# see a subset of the data
head(river, 3) id cleabs code_hydrographique statut
1 cours_d_eau.46 COURDEAU0000002276554466 06C0000002276554466 Validé
2 cours_d_eau.52 COURDEAU0000002215481117 06C0000002215481117 Validé
3 cours_d_eau.59 COURDEAU0000002000818257 06C0000002000818257 Validé
toponyme statut_du_toponyme influence_de_la_maree
1 la Dourbie Validé <NA>
2 Ruisseau de Montmorency Validé <NA>
3 Ruisseau de Lussac Validé <NA>
caractere_permanent
1 <NA>
2 <NA>
3 <NA>
Calculate length of lines
The function terra::perim() calculate the length of lines (expressed in meter). While calculating distance and length, be careful with projection systems. Some are not suited to calculate distance. Prefer equidistant projections or use local projection system (if your study area is small). The package terra recommends the calculation of distances in lat/long to get more accurate results (considering the geodesic distance, so accounting for Earth’s curvature). (addrefslide)
# calculate the length of rivers
river$length_km <- perim(river) / 1000
# see the distribution of river length
boxplot(river$length_km, ylab = "length (km)")
# get the name of the longest river
river$toponyme[which.max(river$length_km)][1] "l'Hérault"
Visualization
mapview(river, z = "length_km") +
mapview(pt_terra, col.regions = "red", color = NA)plot(river, y = "length_km", type = "continuous")
plot(pt_terra, col = "red", add = TRUE)
Polygons
Load polygons from a shapefile
landuse <- vect(here("data", "BDCARTO-LULC_mpl50km.shp"))landuse <- vect(
"https://github.com/FRBCesab/spatial-r/raw/refs/heads/main/data/BDCARTO-LULC_mpl50km.shp"
)Handling spatial vectors
# get the number of objects, and the number of attributes
dim(landuse)[1] 2384 3
# get the projection system
crs(landuse, describe = TRUE) name authority code area extent
1 WGS 84 EPSG 4326 <NA> NA, NA, NA, NA
# see a subset of the data
head(landuse, 3) id cleabs nature
1 occupation_du_sol.32 BDC_OCS_0000002201861697 Broussailles
2 occupation_du_sol.68 BDC_OCS_0000002201863362 Bâti
3 occupation_du_sol.98 BDC_OCS_0000002201860189 Eau libre
# see the distribution of land cover classes (number of polygons)
table(landuse$nature)
Bâti Broussailles Carrière, décharge Eau libre
360 235 35 49
Forêt Marais salant Marais, tourbière Prairie
455 8 54 707
Rocher, éboulis Sable, gravier Vigne, verger Zone d'activités
1 18 316 146
Visualization
mapview(landuse, z = "nature")plot(landuse, y = "nature", type = "classes")
Calculate area
The function expanse calculate the area in \(m^2\). Again, be careful with projection systems (addrefslide). Some are not suited to calculate areas. Prefer equalarea projections or use local projection system (if your study area is small). The package terra recommends the calculation of areas in lat/long to get more accurate results (accounting for Earth’s curvature).
# calculate area of polygons
landuse$area_km2 <- expanse(landuse) * 0.000001
# see area per land use classes
tapply(landuse$area_km2, landuse$nature, sum) Bâti Broussailles Carrière, décharge Eau libre
474.8521879 891.4256281 16.2444372 2799.1424086
Forêt Marais salant Marais, tourbière Prairie
1804.7059831 33.8696654 236.7411376 1607.4753153
Rocher, éboulis Sable, gravier Vigne, verger Zone d'activités
0.3778454 18.7532616 1752.9425395 137.2554131